HEWLETT-PACKARD COMPANY Atty Docket No.: 10001077-1 

Intellectual Property Administration 
P.O. Box 272400 

Fort Collins, Colorado 80527-2400 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

Inventor(s): Craig M. Wittenbrink Confirmation No.: 8406 

Serial No.: 09/881,424 Examiner: Woods, Eric V. 

Filed: June 14, 2001 Group Art Unit: 2672 

Title: SYSTEM FOR PROCESSING OVERLAPPING DATA 

Comm issioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 
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I, Craig M. Wittenbrink, hereby declare as follows. 

1. All statements set forth herein made on my personal knowledge are true, and all 
statements made on information and belief are believed to be true. 

2. I am the sole inventor of the patent application identified above. 

3. I conceived, reduced to practice, and had possession of the subject matter of the 
invention claimed in the patent application identified above (hereinafter "Claimed 
Invention") while an employee at the Hewlett-Packard Laboratories in Palo Alto, California, 
USA. 

4. Attached hereto as Exhibit A is a true and correct copy of an invention disclosure 
document I submitted to the real-party in interest, Hewlett-Packard Co., on November 11, 
1999 and a true and correct copy of my lab notebook dating between August 31, 1998 and 
October 8, 1999, which are hereby incorporated to and form part of the present Declaration 
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in their entireties. The invention disclosure document included the attached document 
entitled "True Transparency with the Fragment Buffer Graphics Architecture" (hereinafter 
"True Transparency Article"), which I authored, and a copy of the attached lab notebook. 

5. Exhibit A clearly demonstrates that I was in complete possession of, that I had 
conceived of, and that I had actually reduced to practice, the Claimed Invention prior to July 
19, 2000, for at least the following reasons. 

a) I was in complete possession of and conceived of the Claimed Invention at 
least as early as November 10, 1999, the date on which I submitted the attached invention 
disclosure statement and True Transparency Article. For instance, a description of the 
claimed "fragment buffer" is at least disclosed on page 2 of the True Transparency Article. 
In addition, the claimed instructions and hardware, which are depicted in Figure 1 of the 
above-identified patent application, is shown as Figures 2 and 3 and described in Section 2 
(page 3) of the True Transparency Article. More particularly, at least Figures 2 and 3 and 
Section 2 of the True Transparency Article disclose all of the features of Claim 1 of the 
Claimed Invention. For instance, Figure 2 shows the claimed first storage (Z-buffer and 
Color buffer), the claimed fragment buffer which holds multiple fragments for overlapping 
data (Fragment buffer), one of instructions and hardware that causes the a device to perform 
various functions (disclosure contained in Section 2), and the detection of the predetermined 
one of closest and furthest visible data for a pixel location is disclosed with respect to the 
example depicted in Figure 4. 

b) Figures 2-4 and Section 2 of the True Transparency Article also disclose all of 
the elements of Claims 11, 17, 19 and 22 of the present application. In addition, the 
disclosure contained in Section 5.1 of the True Transparency Article discloses all of the 
elements of Claims 1 1, 17, 19, and 22 of the present application. For instance, page 17 of the 
True Transparency Article refers to Figure 23 as showing a single pixel representing 
overlapping data. Page 17 also recites that rasterized fragments are inserted (or stored) into 
6t two buffers to determine the closest opaque fragment and the furthest unoccluded 
transparent fragment." In addition, the pseudo-code shown in Figure 18 considered with the 
definitions provided on page 16 of the True Transparency Article, discusses the remaining 
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steps claimed in Claims 11,17, and 19 and the means for successively detecting and blending 
claimed in Claim 22 of the present application. 

c) In addition, I actually reduced the Claimed Invention to practice at least as 
early as November 10, 1999, as evidenced, for instance, by the disclosure contained in 
Section 5.1 of the True Transparency Article. Section 5.1 of the True Transparency Article 
describes in complete detail the elements claimed in the Claimed Invention. In addition, that 
section includes pseudo-code corresponding to the elements of the Claimed Invention. 

d) As further evidence that I actually reduced the Claimed Invention to practice 
as early as November 10, 1999, reference is made to various other correlations between the 
Claimed Invention and the True Transparency Article. For instance, Figure 2 of the present 
application is similar to Figure 3 in the True Transparency Article. In addition, Figure 4 of 
the present application contains information similar to the information contained in Table 1 
of the True Transparency Article. As another example, Figure 5 of the present invention is 
similar to Figure 4 in the True Transparency Article. 

e) As yet further evidence that I actually reduced the Claimed Invention to 
practice as early as November 10, 1999, 1 concluded in Section 6 of the True Transparency 
Article that "[t]he economical implementation of true transparency has been shown through 
the invention of the fragment buffer. A simulation was developed for the best fit for a 
traditional Z buffering architecture... [s]imulation results were shown, as partial validation of 
the design with several models." In addition, on October 8, 1999, my lab notebook contains 
the statement "[t]he state machine has been fully implemented and debugged. Several 
complex models were run through it." Clearly, therefore, as early as October 8, 1999, 1 had 
tested the Claimed Invention and had found it to function properly. 
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I, Craig M. Wittenbrink, acknowledge that willful false statements and the like are 
punishable by fine or imprisonment, or both (18 U.S.C. 1001) and may jeopardize the 
validity of the application or any patent issuing thereof. 
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SUBJECT: TRUE TRANSPARENCY WITH THE FRAGMENT BUFFER GRAPHICS AROffTECTURE 

DACE: 11/09/99 

CC* 



a. prior solutions and their disadvantages 

Prior solutions include; 1) software reording of graphics prirnitivcs at the application level> for 
exampk as necessary in OpenGL [OpenGL92] 2) rn&ripass rendering with a dedicated augmented 
frame buffer [Mauram^ 3) limiting users to 4 leveli with a hardware chip \Kelk^4]^inx^r97i A) 
Software A-buijFier reiuterir [Qu^i3^rX4J 5) screen door rramparency such as th^ impkmented in 
the Reai^^ sorting network and RAM buffering {Baker941 

The disadvantjaig&s of ttte software techniques (1, 2, 4), Are the ineffidency of having an 
application primhives. The sorfchg is viewpoint dependent, so for 3D graphics rendering, as a 
viewpoint is altered a new sorting of afl primitives would be done, or held in a pr^ic^lycreaied data 
smjcture such a$ an ck^e. Such presorting may require cutting primitives that are intersecting other 
primitives to break potential cycles, 

The disadvantages of the previous hardware tecluiiques , are either quality trade-offs, such as only 
supporting a fixed rmmber of lajto (3), reducing the spatial resolution via a dithering technique 
called screen dwr trampar^o^, wl^ alsq supports only a fixed jsumber of tr^parent layers (5), or 
tremendous cost with xn^^dkated -rnemorfes arid circuitry only for the sorting (6). 

There has been no prior solution that provides economical true transparency without changing 
the application, in a graphics 3D rendering architecture. 

b. problems solved by the invention 

The problem solved by the inveriion is how to coinpute a 3D graphics ret^dering pf surfaces that 
are partially ttansparem, whik providing the ease of u$e of a tradidbnal Z-btiffering architecture. The 
prbbkm is a long standing one in graphics , arid involves the proper sorting of the primitives in the 
depth ordering along view ntys. A primary advantage of the invention is how to solve this in 
hardware, with an economical (few gates) method Also, there are no trade-offs in quality, as any 
number of layers at a given pixel may be supported, and the niemory cost is amortized across the 
entire screen. 
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c . Dcscripuon of the construction ami operation of the invention 

The invention Is fully disclosed in the •at^hcd'HP Confidential technical report |W1ttenbrim^9} 
This; invention would fee added as described to a 3D rendering graphics ASIC or ASICs, to 
implement in hardware true traiisparency. A proposed hardware architecture, as well as specific 
hardware control are given ra the referred techracal report. 

d. Advantages of the invention o ver what has been done be fore 

The advantages of the invention over what has Been done before are: the invention does not 
require modification of di^ apphcaton; the invention does not force the application to sort primitives 
(triangles) tp y^rk properr^ is economical, as it requires only simple modifications to 

the Z-cqmparison and fcbn^^itir^ logic of exist^ the invention provides 

new features without comprorriising existing features; and the: invention may also incorporate 
antialiasing. 

REPERENOES: 

[Akeky?3] Kurt Ak%^RMtyEngine Graphics. In Proceedings of SIGG&\PH, pages 109-1 16, 
Anaheim, August 1993; ACM 

[Carpenter84J] Loren Carpenter, The A-huffer> an annaliased hidden surface method. In 
Pnxfeeo%s of SIGGRAFH, pages 103^108. AGM, July 1984. Vd.:i8,Na3. 

[Baker94] Stephen J. Baker, Dennis A Gowdrey, Graham J. Olive, and Karl J. Wood, Image 
generator for generating ttersrjective views frjomo data defining a model having opaque and 
translucent features. I^ted Stttes Patent Number 5,363,475, Nov^$ 1994. 

[Kelley94] Michael Kelfey* Kirk Gould, Brent Pea^e, Stephani Winner^ and Alex Yen, Hardware 
accelerated rendering of CSG and transparency. In Proceedings of SIGGRAPH, pages 177-184, 
Orlando, FL, July 1994. A€M 

[Mamnien89|Abraham Mamrnen, Transparency and antialiasing algorithms: ^ ini^lemeru^d with 
the virtual pixel mar^ technique. IEEE Computer Graphics and Applflcations, 9(4):4>55, July 1989. 

[OpenGL92] QpenGL Architectural Review Board. OpenGL Reference Manual. Addison- 
Wesley, Reading, MA 1992. 

[WinifTer97] Ste^hani? Winner, Michael Kefley, Brent Pease, arid Alex Yen. Hardware accelerated 
rendering of anriajjasmg using a modified A- buffer 4£orithm. In Proceedings of SIGGRAPH, pages 
307-316, Los Angeles, CA, August 1997. ACM. 

[Wittenbrink98] Craig M. NJCfenbrink, True Traiisparency w#h the Fragrnent Buffer Graphics 
Architecture/HP Labs Qmfidehtial Tediniail Import, HPL-99-T^D, Nov. 1999. 
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True Transparency with the Fragment Buffer Graphics Architecture 

Craig M. Wittienbrink 
Hewlett-Packard Labs 
1501 Page Mill Road 
Palo Alto, CA 94304 

November 10. 1999 



The fragment buffer Is a new method for providing computation for true transparency of rendered frag- 
ments. True txansparency is provided without altering the application, without requiring the application to 
sort tWtlata, and without the deficiencies of previous methods such as screen door transparency and the 
A-Mnus buffer. The fragment queiie ox fragment buffer can compute true transparency with any number 
of layers. A variant of the fragment buffer that was designed for minimal hardware conipfexity, with max- 
imum algorithmic improvement is ^ Statistics are shown for a variety of different scenes using a 
trace based methodology, and an instrumented Mesa(T^) OpenGL implementation. The fragment buffer 
Is shown to require from 2.1 to 3.6 times more memory than traditional Z-buffenng to prdyide true trans- 
parency. Detailed hardw^e design is provided, in&udmg the state ^an^tion Aagranis, next state table, 
and architectural schematics; The fragn^nt buffer can also be used for antialiasing, and an example of 
Cj^eirt^r's classical A-buffer aiu^^iasihl is shown. A key invention of antialiasing is to modify Carpenter's 
recursive algorithm into an iterative front- to-back processing. 



1 Introduction 

The Fragment Buffer is about achieving: greater graphics vjsual realism through novel use of resources. This 
paper di^cusses the architecture and shows examples of a tendering simulator that uses the fragment buffer. 
Sutherland, Sproull and Sehun^acker [11] explained hidden >oirface algorithms as sorting to /determine what 
is visible on the screen. Object space primitives are sorted to screen space locations in X, K and Z. Most 
architectures compare the J? lotion with the existing values m a Z buffer, and decide what can; be thrown 
out, or oyerwritteir into the Z buffer; The technique is simple, arid fast in hardware. It has dominated 
graphics architectures for nearly two decades. But, therej are deficiencies. It is difficult to use Z buffering 
wuih complex fading aiid/or touring algorithins. The reason is because a large number of pixel values are 
overwritten, so exp^ye shading, can be overly burdenspfiie. Another prirnary dipculty is that Z Suffering 
is a read modify write, and ^ an actual sort is not being done. Therefore, true :ttm^^&4» iaS^ possible 
efficiently on a Z buffermg archit^cture. An additional diflBculty of Z buffering is that antialiasing is expensive 
and requires expanding the Z; buffer to the number of sub^amples used for antialiasing. 

Compute intensive shading can be done with a sort last! approach [7]. If the sorting of an object primitive 
to the serpen, is left until the last step of the graphics pipeline, it is called a sort last technique from Molnar 
et aL's proposed parallel rendering taxonomy. PixelFlow {$] used sort last so that all pixels were determined 
to be visible or not, and then shaded^ wmcb makes shading and/or texturing work proportional to the frame 
buffer size, not the object complexity. As models become large, this is an important advantage. PixelFlow 's 
primary difficulty is the bandwidth and subdivision necessary to composite entire screens between different 
graphics pipelines. A sort last approach solves the inefficiencies of the Z buffering architecture, butdgsa-aot 
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provide correct transparency. 

Improved methods for afreet transparent have been investigated by Mamineh [6], Garpenter [3], Keiley 
et al; [5, 12], and Baker et al. [2], The proposed tediniquas are either software only [3], require multiple 
passes of rendering tiie goomctry [5, 12, 6], and/or </rdy rentier a fixed number of transparent levels correctly 
[5, 12). Transparency is chaUen^ging problem to solve in hardware, and other techniques such, as screen 
door, or sorting the polygons b^-to-ftont in the applicatk>n liave been used. There are quality problems 
witir screen door transparency [1], essentially a dithering technique, and requiring ^plicatkais to send the 
polygons in sorted order is not general to legacy applications or easy to do. 

Impn^ed methods for antialiasing have been investigated througliout the graphics literature, and muner- 
ous techniques exist in liardware, antialiasing has ten done most directly through supersarapling [1, 8]. 
Adaptatiqaas of the A^buSer [3] to hardware have also been investigated, with partial iraplernentatioas due 
to the generality of the A-btiffer approach (12]. The difficulty to economically implement ^ ^tialiasing, has 
meant that most graphics architectures support antialiasing with multiple passes through the geomet ry. 

Figure; 1 on the bottom row shows what happens in OpenGL, when rendering 3 transparent squares of 
red, green, and blue. A different irn^e riesults from each different drawing order, even though the 3 squares 
liave a fixed Z depth Edition. On the top row of Figure Indifferent drawing order does not impact the visual 
appearance. This shows the results of true transparency, with (lie invention of the fragment buffer. 




Figure 1: Top row, fragment btiffer, sanae appearance. From near to far, the squares are ordered Blue, Green, 
Red- Bolton row* OpenGL. different every time. 

By the addition of a memory, caOed the fragment buffer, proper transparency ordering and antialiasing: 
can be economically implemented in hardware. I show jhow, for example, correct toansparency can be 
imptemenied in such an architecture. I also show how tjie A-buifcr algorithm and adaptive antialiasing 
or nonuniform sampling can be implemented . Essential!)', a frame buffer is used for storing the closest 
opaque fragment, or the furthest transparent fragment if there aren't airy opaque fragments. For pixels with 
additional fragments, those Segments are sent to the buffer along with their X and Y location. In successive 
passes, the fragments are considered, and composited [9] (Porter and Efuff) into the frame buffer. Only 1 
pass is neMed for prooe^smg the geometry, so no extra sfor^gcia needed for geometry, and a single fragment 
buffer is shared for the entire screen* This amortization of the extra storage over the entire screen allows 
unique savings over techniques with large per pixel dedicated storage. For archit^urqs that do screen based 
subdivision, such a fragment buffer fits in naturally. Bucketization of piirintiyes and/or fragments reduces 
the storage requirements of the X and Y location, as it is addressed only within a tile. 

True transparency and adaptive antialiasing have only been attempted in hardware in high end graphics 
image generators for flight suriulatidn [2]. The use of a fragment buffer is flexible and efficient for the 
calculation of proper transparency, without multiple passes of the geometry. Multiple passes of fragments 
are efficiently done, and many fragments are culled , arid eliminated with Z and occlusion testin g* jScreg rL 
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space subdivision, which allows far the reduced communication requirements, also provides for reducing the 
amount of sorting necessary. And, antialiasing can; be done efficiently without dedicating a large amount of 
memory per pixel. The buffer area may be partitioned tb provide efficient separation of different types of 
fragments, an<i the novel ea^ of iionnnifom sampling in a hardware pipeline. Experiments show the required 
memory to support true transparency with highly detailed models to be from; 2.1 to 3;6 times more memory 
than a traditional Z buffer. Additionally, a reformulation of Carpenter's software recursive antialiasing 
technique, shows how to perform iter^tiN'e front-te^back antialiasing with this proposed hardware. Many 
issues need to be investigated, sticn as tne to and the support of all OpcnGL modes. 

But, the advantages, and the proposed performance levels, make the arcliitecture an advance beyond what 
has been: possible. 

The main inventions describe how to achieve correct ordering for ; transparency and how to achieve an- 
tialiasing ; economically. Section 2 describes in the fragment buffer architecture in the context of a Z-buffering 
graphics rendering architecture. Section 3 shows the state transition diagrams, and next state tables, for 
the comparison logic. Section 4 shows the results with several test data sets. Section 5 discusses fragment 
buffer: variants, and Section 6 concludes the paper. 



2 &agirient Buffer Graphics Hardware 

To explore the most economical hardware, an invention that augments a conventional Z-buffer is explored. 
Many variations of the invention are possible, and are discus^ furtlier in Section 5. The scenario is as 
follows, the architecture is a standard graphics pipeUoe- witti jge^nietry processing (G), and rasterization 
(ft); We add a fragment biififer^ and .a 2nd Z storage to the fraiiie buffer. This configuration prdvides the 
maximum advantage -with the minimal additional memory. Figure 2 shows the architecture. Figure 3 shouts 
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Figure 2: Graphics architecture schematic plus added fragment buffer, and second Z buffer. 

more details of the new processing. A fragment coming ifrom rasterization or from the fragment buffer is 
multiplexed into the fragment buffer comparison &nd controller. The fragment buffer is considered to be a 
circular queue t and if the queue overflows because of ^cessive fragments then it can be paged to systems 
memory. Because accesses are sequential and used on 4 first-in first-out (FIFO) basis, performance will 
degrade gracefully. A fragment is defined as a point sample with color, opacity, and depth resulting from the 
rasterization (R), as in RGBAZ. Depending on the fragment's opacity, depth, and the previous state of the 
frame buffer at that location, a fragment may be stored on the fragment buffer, composited into the frame 
buffer, or discarded. The fragment buffer comparison and controller is where the z- buffering comparison 
typically takes place. The z~bufferiug is now augmented* and revised to provide J^rue fcra 
antialiasing- The processing is demonstrated through an example- 
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Figure 3: Detailed diagram of interface to fragment buffer and frame buffer. 



Figure 4 shows a pixel with 8 fragments, one of which is opaque, 0. The transparent fragments are labelled 
Tl to 14 and Tx. Ty, and Tz> The fra^nents are drawn in the order shown 1, 2, 3 ? 8, and processing 
occurs as shown In the figure. Processing occurs by first cbnmdering the fragments during rasterization. 
This is fihase 1. Then, if there have been fragments placed into the fragmeni buffer, following passes are 
performed. Only a single pixel per screen Jocatioaiis saved, so when multiple fr agents that are unoccluded 
He upon a single pixel, they are s^nt to the fragment buffer. This example has 6 passes: phase 1, phase 2, 
phase 31> phase 32, phase 33; md pria^ M< For a shorthand notation, the frame buffer is labelled B t the 
next Z value* or 2nd Z value is J?**, : mxt^ pr^rhejs 1?^, and the fragment under consideration is F. 

Figure 4 shows how during phaise 1 } ;any opaque ia> , ers found. In this case transparent fragments, 
closer thkn the- opaque layer were all queued, Tl , T2, T3, and T4; An underline indicates the fragments in 
the fragment buffer. The transplant layers beyond the opaque layer, Tx and Ty, were queued. In phase 
2, these .fragments, Tx and Ty, are culled, and the furthest transpareiit layer's Z is saved. The fragments 
are processed in the same order each time as they are read from and written to the fragment buffer. Note 
that the true depth complexity of the pixel is 5, but wo took 6 passes. In cases, where no further than 
pp^ue-transparent-fra^entsj are queued, there is one less pass, 5. Phase 2 culls fragments Ty, and Tx 
as they are further from the eye point than the opaque layer 0. Next, in phase 2, frajpstent Tl, has its Z 
value saved as B nzy and is put on the fragtoent buffer, as shown by S. Fragments T2, T3 f and T4 are also 
requeued. In Phase Jl, the frame bufrer, B, holds the opaque Zy Oz, and color attributes. B nz ^ or Next Z, is 
also in the frame buffer, and holds the proper Z value for the fririhest transparent layer; Tl. In Phase 31, the 
fragments come out of the fr^ment buffer is the same order that they were placed there. First, fragment 
Tl is read, and "it's Z value matches BnZ. Therefore, it is immediately composited with the frame buffer. 
The next fragment is read , T2v and it is the furthest Z that is closer than the so the B , rtx = (Bnext 
Zprirhe) is written, and the fragment is requeued (S). Fragments T3 and T4 are considered and requeued 
(S)- 

Phase 32 continues the same as p liase 31, with the remaining fragments on the fragment buffer. There 
are T2, T3, and T4. Note that the B' nz or NexteprimeJ of pbase31 is B nzj or Bnexti of phase 32. This 
alternation of the interpretation of B„ x and B' nx continues for each even and odd phase 3 needed. The z 
storage vised is the frame buffer Z, and the 2nd frame buffer Z. Always just 2 Z values per pixel. 

Phase 33 starts with a buffer, B, that is the previoiisly composited opaque, and furthest two transparent 
fragments. B nt is the % value value of fragment T3. T3 is the first fragment considered, so it is matched 
with B n s and also composited, fragment, T4*s Z is set to Then in phase 34, fragment T4 is considered 
again, matches B n£) and is composited into the final correct pixel color. Once the fragment buffer is emptied 
processing completes for that frame. The fragment buffer must contain the location of the fragments as 
fragments for the whole screen are intermixed on the fragment buffer. Figure 5 shows possible fr^^igtfs 
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<f- /\/\ m/ 



lye T4 T3 T2 Tl O Tx Ty Tz time 

7 6 5 4 8 3 2 1 phase 1 

B 1- fragment Tz 

Bnz/S\ 2 

S 3 

S } 4 

S / 5 

5 j/ 1 
Discard B (Tat) B^paiquc invalid 8 

,T4 T3 T2 Tl , 6 t Tx IVj 

6 5 4 3 B 2 1 phase 2 
fragments further than cull Ty 1 
opaque culled cull Tx 2 

Bnz/S 3 

S 4 

S 5 

S 6 
,T4 T3 T2; Tl t O Tx Ty Tz 

4 3 2 l t Bnz B phase 31 

;Fz^Bnz, composite 1 

BoiVS 2 

S 3 

5 4 
t T4 T3T2^ Q 

3 2 IJBtaz B phase 32 

Fz-i^nzicompbsitc 1 

BnzVS 2 

S 3 

2 l,Bnz : B phase 33 

Fz^B^oraip^te 1 

BnzVS 2 

IJBhx B phase 34 



Figure 4: Fragment processbg exauiple, for Z-buffer, with Extra Z, and fragment buffer. 



physical location 


phase 1 


phase 2 


phase 3 odd 


ph ase 3 e ven 


phase 3 odd 






B nx 








B * 


Bz 


: 









Table 1: Use of 2nd or extra z buffer, the phase next z prime (Nbz 1 ) and Bhextz (£ ns ) alternate interpretation, 
while their location is in D nj , or Bz of the frame buffer physical location. See on the left^ t . A 
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on the fragment buffer, with location (X 0 , Y 0 ) intermixes! with (X pi Y v ). The state machines that control 
processing for the fragment buffer are described in the next section. 

Xp, Y P > 22> A2 
Xp, Yp, Z4» A4 

Xo, Yo, X-'-, X <*/ fragiwot froa a different location 
Xp, Yp, 23, A3 
Xp, Yp, Z5, A5 



Figure 5: Fragments on the fragment buffer. 



3 State Machine Specification for Single Frame Buffer Solution 

Figures 6, 7, and 8 show the state transition diagrams of frame buffer pixels during processing. Figure 9 
shows the fragment bufFer= cpmp^igbn rad controller states beginning with initiaiization into ph^ 1 after a 
new -frame. The number of phases varieti dependiug on the frame Vdcpth complexity, as ilhistrated in the 
example in the previous section. Phase 1 is always first, and aD fragments are considered during rasterisation. 
After all fragments from rasterization have been processed* any frapnents that were placed on the fragment 
buffer ard considered in phase 2, and sb on. The processing terminates when the fragment buffer is emptied. 
For phase 3, processing alternated b^tune^ and even phases as shown in Figure 9. 

Bach frame buffer location f has a unique state. So each pix$ a state machine; only, you just need to 
consider the state machine for ihfc fr^ The state ^ciiihe miuires 6 states for phasel, 3 states 

for phase 2, and 3 states for phase 3. The 6 states liavc been labelled as shown in Table 2. 



BOTHJNVALID 


initial state, no fragments seen here 


YALID:OPAQUE 


1 ppaque- fragment seen and stored 


VALID .TRANS 


1 tra^a^t fragment seen iand stored 


OPAQUEJNV 


an pp^ stewed closer than queued fragments 


BpTH-VALiD.f.O 


at leftst two fragments, opaque and transparent 


BOTH-VALID -T>T 


at least two fragments, both transparent 



Table 2: State assignment definitions: for the 6 states in phase 1. 

The same state assignment are reused for all 3 phases. Of course interpretation varies between phases. 
The 2nd and 3rd phases will ^nerate aa error, if a fragment is located where zero or 1 were seen in phase 
1. Phase 1 states; have been separate into 3 columns. In the left column, no fragment has been seen, and 
therefore B and B njt are both in valid* In the middle column, B is filled. In the right column B and £ ni are 
filled and at least, 1 fragment is on the fragment buffer. 

For this implementation, the rule* for equal valued depths are that the earlier fragment is in back of 
the following fragments. For a transparent fragment to be seen, it must be less than the opaque Z. After 
the fragment buffer is emptied, the tendering and compost ting are complete. If there are not more than 
1 visible/opaque or transparent fragment per pixel, then processing completes in phase 1. With only 2 
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Figure 6: Phase 1 frame buffer pixel state transition diagram. 




BOTH INVALm. VALlb WAQVH 



Reran phstt 1 for these ttifcneali 




BOTH VALID T O, 
BO*H VAijD l T 




<*mm j 
both valid 



Figure 7: Phase 2 frame buffer pixel state transition diagram. 




BOTH INVAUH VALID < 

VAJJ& j»Aks opaxjue_is*v 



BOTH VALID T 0 



Figure 8; Phase 3 frame buffer pixel state transition diagram. For phase 3 Bnz and Bm* alternate in storage 
location between even and odd phases. 
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current state 


inputs 


outputs/side-effects 


next state 


BOTH3NVALID 


F„ 


B = F 


VALID-OPAQUE 


: BOTHJNVALID 


ft 


B = F 


VALID.TRANS 


VALIDiOPAQUE 


Fz > = A 


none (cull fragment) 


VALID-OPAQUE 


VALibiOPAQtJE 


F 0 ,F t < B c 


B = F 


VALID-OPAQUE 


VALIDIOPAQUE 


Ft , Ft < B, 


B„ t =F l , queue(F) 


BOTH.VALID.T.0 


VALIDlTRASS 


F 0t F,<=B z 


B = F, (cull B;replace frame) 


valid.qpaquis 


VALIDlTRANS 


Fo,F : >Bi 


queue(B) f fl n2 = B.,B = F 


BOmVALID-T.O 


VALIDlTRANS 


F,., F t <= B t 


B ni =Fi, qiieuieCF) 


BpTttlVAIilD.T.T 


Valid trans 


Ft, F z > B z 


queue(B), B»i = B i) B= F 


BOTH-VALID T.T 


bothIvalbd.t.o 


(F,.),-F. >=B Z 


none (cnU fragment) 


BOTHiVALID-T-O 


BOTH-VALID.T.O 


F a , F t < Bt&Fj > B nl 


B = F 


BOTHiVALlD.T.O 


BOTH-VALED-TO 


F 0 , F t < B.kF z <= B„. 


B = F,B BJ =:0(iiivalidate Bnss) 


OPAQUEJNV 


BQT^VAUD.T.O 


Ft, F t < Bi$&F t > B ni 


B ni = Fj.qucue(F) 


BOTH.VALID-T.O 


BOTBTiVALIDiT.O 


F t , F, < BiSiF x <— B n: 


queue(F) 


BOTH.VALID-!T-0 




F oy F;> B t 


: Bk - B S) queue(B),B = F 


BOmVAUD.T.O 


Bip^tf-VAIiID-T-T 


F 0 ,F, <=B t kP t >B n . 


B - F,(replace frame) 


BOTH.VALID.T.0 


BOTHlVALipjr-T 


F 0 , F s <= B : kF z <= B*, 


B = F,Btiz = 0(replate frame) 


OPAQUEJNV 


BOTBiiVALID^T-T 




B„- = B,,queue(B), B = F 


BOTH.VALID.T.T 


BOTHiVALIp.T-T 


F (? F s <=B i ,&F J >B ns 


B ni = F„queue(F) 


BOTH-VALID .T.T 


BOTHiVALID.T.T 


F,, F- <= fl.&F, <=B ni 


queue(F) 


BOTH-VALID.T.T 


OPAQfEJNV 


F. >= B- 


none (cull fragment) 


OPAQUEJNV 


OPAQUEJNV 


F^F* <B„ 


B = F 


OPAQUEJNV 


OPAQUEJNV 


F t .,F s < B z 


queue(F) 


OPAQUEJNV 



Table 3:; State transition table for phase L queue(X) means to place fragment X on fragment buffer or 
queue. &frame bufifer, F-fragment being considered, F 0 - fragment opaque, F t - fragment transparent, (F x ) 
opaque/transparent don't care, F £ - fragment depth value, frame buffer depth value B nz - frame buffer 
next z depth value. 
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process ail frmgromi* from ntstcnzatioa 




previous ptuttc 



Figure 9; Overall state .machine for the Ragment buffer comparison and controller of Figure 3 



current state 


inputs 


outputs/side-effects 


next state 


BOTHcVALID.TJO 


F: = B ni 


B = composiieiB, F) 


B OTH^ 


BOTHfVALip.T.O 


Ei ■ — Bns rPns r >= Bra 


B' nz = F^ t queue(F) 


BOTH r VALID-T^O 


BOTHlVALip-T.O 


F^B^B^ <B nZ7 F : <=^ r 


quene(F) 


BOTH -VALID.TiiO 




m^ BnzJBU <B nii F z >B^ z 


B'r> z ~ F,,que«e(F) 


BOTH-VALID.T_Q 




JFl > = B x 


none (cull fragment) 


OPAQUE JNV 


OPAQUE JNV •, 


F z <JB Z &F* > Bnz 




OPAQUEJNV 


OPAQWBJNV 


i> <MUFz <^,B nz 


queue(if} 


OPAQUE JNV 


(OPAQUEJNy phwidra same behavior as B0TH.VAL1D.T.O of phase 1) 



table 4; State transition table for phasel queue(A') means to place fragment X on fragment buffer or 
queue. B-frame buffer, Fragment being considered, F p - fragment opaque, F r fragment transparent, (F*) 
opaque/transparent don't care, F z - fragment depth value, B z , frame buffer depth value B nz - frame buffer 
next z- dejpth valu^, compo^iie(B, F) Porter and Duff over operator or OpenGL composite. 



eijrrent state 


inputs 


outpuls/side-efreets next state 


BOTH-VALID.T.O 


Same as phase 2 above 


OPAQUEJNV 


ft == B hz 


B = compositc(B,F),B z = B ne 
(or -flk = \ B nz ,B„ t = F, 
already) 


BOTH.VALID.T.O 


OPAQUEJNV 


FJ= B n , 


B, = F,,queue(F) (or B' nl = F.) 


BOTH.VALID.T.O 



Table 5; State transition table for phase 3. OPAQUEJNV only occurs in phase 31, not in phase 32, phase 
33, etc. Note: that for phase 3, the B» z and B' nz are in different physical locations depending on even or 
oddness of the phase. Phase2 looks like an even phase 3. 
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fragments in a pixel that are not occluded processing may complete in phase 2, and so on. Tables 3. 4, and 5 
provide the state transition diagram, next State transitions as well as the side effects and outputs. In phase 
3 the Bnz and B' nz alternate their physical location as shown below in Table 6. The outputs and side effects 
axe given in sequential order, for example in Table 3, 7th row, "VALID^TRANS, F oy F 2 > B z ," the frame 
buffer fragment is written to the fragment buffer, queue(B), its Z value saved as next z, B nz ~ B Zi and then 
overwritten by the fragment under consideration t B = F. 



physical location : 


phase 1 


phase 2 


phase 3 odd 


phase 3 even 


phase 3 odd 










Bn, 














Bnz 



Table 6: Phase 3, physical location changing meaning from Bnz to Bnz' in odd and even. 

This state machine has been fully implemented and debugged, and several models have been run through 
it as discussed in the next section. 



4 Rfesults 

The fragment buffer implementation with a Z buffer and an extra Z buffer as described in the previous 
section has been implemehtedv Statistics have been gathered on several models to provide an indication of 
the perfonuance irnpBcatioiis, Table 7 provides ^^tics for processing of 4 scenies. All images were 512x512 
pixels. lW^ene^;as>^ 11 to 14. Ar^fac^ from OpenGL (middle images) include 

the tire in the back seat of the chevy in Figure 13, and; the nose gear on top of the helicopter in Figure 14; 



data 


no> of pbase 3 ; 


C0t$ pa^eS; 


Z bandwidth 


frag bandwidth 


avg depth 


scene 


6 


8 


2,765,189 


7,497,201 


2.35 


spheres 


10 


12 


8 5 1?8,293 


36,889,274 


3.93 


chevy : 


8: 


10 


3,404,954 


8^718 


2.17 


heii 


9 


11 


2,798,204 


8,155,998 


2.66 



Table 7: Fragment buffer processing: statistics, bandwidth in bytes, all 512x512 frames. 

The; necessary bandwidth to the memory holding the; frame buffer, extra z buffer, and fragment buffer 
was computed during the execution of the simulation. This considers the memory model shown in Figure 2, 
where the frame buffer, extr a Z buffer^ and" fragment buffer are all in the same memory. Table 7 provides the 
memory traffic for the conventional Z buffer (Z bandwidth), and the fragment buffer (frag bandwidth). The 
depth complexity is also provided in the table, and is an average over pixels covered, for the case where all 
geometry is considered transparent. Fujure 10 plots the conventional Z buffering bandwidth to the fragment 
buffer bari dwidth . For the^ seines, it can t»e seen that the number of passes may be high, on. the order of 8 
to 12 passes. For complex mterpenetrat^ the depth complexity can be arbitrarily high 

at a given pixel. For example, in unstructured volume rendering, the depth complexity will be much higher, 
as there are thousands of layers for some pixel locations. But , because the application sorting of the data 
prior to rendering is such a burden, even the numerous passes of the fragment buffer will achieve superior 
results. 

The key thing about the bandwidth numbers, is that they are on the same order as the ^buffering, 
and there will also not be any texturing at the time that the fragments are being sorted. Therefore, the 



HJP. Confidential 



pexfomiance impact may be modfcst, because the firstpass will be similar to Z-buffering, and the^bsequen 
passes will not be competing with texture mapping operatto. The ratio of Z buffering traffic to totaJ 
fragment buffer traffic varies from 25 to 4.5 in these examples, these examples are aLso severe, in that aU 
surfaces ate transparent, so anew capability will place different stresses on tin? system. True trangarenfy 
is riot as easy as Z biiffering, so some diftarence in processing is expected. The number of passes for these 
scenes is no t expected to vary with higher resolutions, but the number of fr agents will increase in both Z 
buffering and fragment buffer processing. 




Figure 10: Bandwidth for the four scenes shown in Table 7. Conventional Zbuf&ring traffic in bytes is 
compared to rendering all surfeces as parti^ transparait jwith the fragment buffer. Those with the highest 
scene complexity and depth corapfedty recr^e more bandwidth, 



Figure 11: A scene of a cone torus, and sphere. The left image is Z buffering, the middte image is OpenGL, 
and the right image is the fragment buffer. 

Figure 15 shows an example of the multiple passes that are taken. In the phase 1 pass, in the upper 
left, the fcarmost fragments axe deteniuned, and placed into the frame buffer. On thenepct pass, in phase2, 
the next furthest transparent layers are composited into the frame buffer. In these renderings, tte front and 
back faces of triangles are shaded, as the front and rear of the sphere are visible with all surfaces slightly 
transparent. 

The implementation was also rigorously validated with permutations of cases of 3 transparent levels, as 
shown in the introduction* and with 3 transparent levels and an opaque layer that was placed m all possibly 
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Figure 12: A scene of 7 intersecting spheres. The left image is Z buffering, the middle image is OpenGL, 
and the right image is the fragtaent buffer. This scene is meant to compare to Mammen's Figure 4, where 




Figure 14: A scene of a Caligari Hue Space model of an Apache Helicopter. The left image is Z buffering, 
the middle image is OpenGL, and the right image is the fragment buffer. 
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locations, in front of all three* at the same depth as aJl three, behind all three, or in between the 1st and 
2nd or 2hd and 3rd. The traiKspareiit layers could be drawn in & orders. The opaque layer was placed at 7 
different ; locations, and the opaque l^or oould be &awn in 4 dittererit orders in relation to the transparent 
layers. Rgure 16 shows 3 examples from; the 168 (6*^*4) ctMnbinations that \^ere ^ verified. 

Front the simulation, conclusions can be made regs^rding the size bf the rnerriory needed to support this 
toctionllity; The frame buffer is assuxned to contain a; fra^ent, which I define simply as R, G, B, A, 
taking 5 by tes or so. l^h <xmpoue^ 1 byte, and alpha typic^y rieeds higher precision for 

compbsi^g, whicii would be t^essary for an architecture surjpGd^g: trae tar^^ar^<^y. In lie examples 
;&qwh; & so an alpha buffer ish't ^t^afiy jii^ded-in this instance but 

has beer* assorted: Soy rxsolytipri ^5 is the frame buffer storage in T>ytes; Ad^onally, a ~Z buffer is used, 
and tr^l^i asstim bytes. For fragment buffer prbc«^ an additioriil Z bufter is used for 4 

additional byte per ipixel. It is assumed that the 3 bits rieefied for state p^ pixbl, can be within this 4 
bytes. or|Simply^added 4 bytes + 3 bits. 

Now 'the actual size of the fragment buffer to be used is at least as large as the number of fragments 
to be stored in vthe i ifr^meht bidfer on phase 1. This is somewhat arbitrary, as it depends on the depth 
complexity 0G) of the some (felly transparent <lei>tb tompl^dt^j, and the J^nserit of the frame buffer 
covered.|The ^ * a&ed^tirigt. added storage is straight 

forward to ^ Z'sVbyt^ (4)^ \ i^d the address of 

the pixel! the fxa^iient maps to (3) to total .12'.-=?: ;5 ^ 4 + 3. In this tase 3 byl^as would be feble to address 
a 4096x4096 trame i&bfe 7 detailed the l^dwidth cm c^ritf bi^ 

mm^^efede^i ; Jfpr these datasets. triat from ^$ior3^63 times mranory is 

ne^ed; for a system with the tia^^t ib^ The re^olirtiohs can bfe scaled up, and the assumption that 
the sami: ^ that the rJeptli compl^d ty stays the same, . means the ratios 

are apprtixirn^ ri^alurianfis the si^tf trie address necessary to 

stare on|the^a^neni, buffer: $l^xM2i^frame bjuger: 18 bits may be ! ftseil; For slightiy more bits 

hi^r i^lutkms are achieved, ii^lQ2^w^W\M, 1280x1024 with 21 bitsy tad: !6(M)xl2dO with 22 bits. 
Tlrfe ihaeases the ratios of additional memory niseded only slightly (2; 13 to 2^17 for the helicopter). 



data : 


franie buffer 


extra Z 


frajr>nent biiifer 


total fra^ent buffer ; 


total frag/frame buffer 


scene 


2;25MB 


1MB 


1,57 MB 


482 m 


2.14 


sphere 


2;25MB 


1 MB 


4 92 MB 


8.17 MB 


3.63 


chevy 


2.25 MB 


1MB 


1.85 MB 


5.10 MB 


2.27 


heli f 


2.25 MB 


1MB 


1,55 MB! 


4.80 MB 


2.13 



Table 8:iFrag?hent buffer prk>o*ssirig sti^Uc^i mirmiium; mmbry t^age ^ MBytes, for the 512x512 frame 
buffer (cimstant), extra Z (constant fragment buffer, total fxfi^art buiBEer, frame buffer, and extra Z, 
and a ratio of the total fragment buffer r^emory versus the frame roller for traditional z buffering. 

But, for tile based hardware, there are additional improvements possible because of the reduced address 
to fag stok^w^ the fragment feuEffiar. jbr example a 160^1200 screen, requires 11 bits for X 

and ]% for a W^i o^Hrh^d p^ frafpn^t. Tilihg% 6^64 tiles^ rajj^** 12 bits for X arid V, versus 22 bits, 
a lObfo^vmg for fragments. fheiaVmgs is a fix^^ how large you decide to make the 

fragment- bmfe, ranging from 10% to 15% for ^ 

tile. For most graphics the depth complexity is considerably less, sp less memory would be consumed. 
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5 ^agment Buffer Variants 

The fragment buffer implementation as described is t<irgotcd for lowest co^ with reasonabb perfonriarice. 
A continuums trading off memory versus number of parses, is possible. In addition the memory 

overheadiis reduced, for tile based architectures. Antialiasing can also be supported, which provides for 
an adaptive supersamphng strategy. Table 9 shows: the number of passes, needed for various hardware 
complexities; Case 1 is l fr^e buffer, for which 2n passes are h|ieded, ^here n is the worst case transparent 
layer dep^ complexity 2 is thi case juj^ pr^mt^ one extra frame 

of Z\ The number of passes is halved to n. Thisc|^2;a^ 

compositing is deeded, because compositing occurs when the fragment -z value matches the next z value 
P z — — B^ z . Case 4 is the most general, where there are N % feme buffers available for the entire screen (or 
tile). In this case only n/N passes are necessary, but with the obvious increase in necessary memory- 



case 


description; 


total passes 


1 


1 fra^ buffer 




2 


1 frame buffer and 1 Extra Z buffer 


O(n) 


3 


2 frame buffers 


0(n) 


4 


S frame buyers 





Table 9: Trading off memory versus nuiru^ of ;p#s^. Caise 2 is the one presented in the previous section. 

In this section we briefly ■ discuss the op^ons of iising fragnic^t buffcxin^; with a tile based, deferred shading 
axchit^tiMtre we c^I I>efen^Z t and I show an example 

of proces^g t^ im a softw^ tedmique^ m hardware. The IMerredZ 

architecture is a pipeline that delays the z compares and sorting, and also allows deferred shading to be 
performed. Figure 17 shows the overall DderredZ; architecture. Brora the lefbj priimiives are sent across 
AGP ( Ac^Jmt<e$ Gr#hks Port ) , into - the Pr^ullbig and i X Y r^clqetamtibii st^e; T*he output of this stage 
are the Screen ^ survive aiiliug. The next stage is the Hierarchical 

Z-culhng^or eidling, where hierarchical 2-cullmg is, performed only for a region of the screen . Following this 
is XY rasterisation of primitives that are not occluded. In this case rasterization: means conversion of the 
triangle data to raster X Y coordinate location fragments^ although all shading attributes such as normals 
are forwarded with the fragment; 

^e io^xt st^ge is the fragment buffer compare logic a$ described eartier, with compositing moved later 
to fblipw|shadihg. The Fraj^ent Stack, is used for temporary storage to consider , su^iying fragrnerits. 
Fragment^ afta; the first pass axe srat along to the |4^tin^ and ^ Cplortop^rid Sh>ulihg phase of the pipeline. 
Because |he pprnitives are sent in the proper order, tl&y can be directly composited once all lighting 
calculations have been done. After this point priinitive^ iare s^t directly into the firame buffer, which is 
dktribuifjjd in thie screeia bucket fashion. It is assurhed that buckets would be screen square tiles, but this 
architecture ec^j alao uae scanjirie paraJlelbni, or pixel infcerieirved parallelism. 

The%are f6uT memories in the system, labelled A, B, G, and D. There are two new memories in B: the 
Fragment Buffer and adcUtionil ZrbufFers and attribute storage: There may be none c* a small number of 
additional ^buffers. Figure 21 and 22 show N = 2, and other cases of; jV == Q, and N = 1 are analyzed also. 
The pipelines are truly independent, iand no intercommuhicatibn is required between them for single pass 
rendering; This assumes that the texture memory is replicated so each pipeline has unfettered access to the 
textures that they need. 
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Succeeding Vttfbte ti&tedand 



Vertex Raster 
W&fcau and 
Primitive 



At* 




Figure 17: DdferredZ overall architecture 

5.1 Fragment buffer for coiTect transparency 

TWs variant of the fragment buffer provides deferred shading and true transparency* Figure 21 shows the 
memories induding the fragment buffer, depth buffer (s)> and attribute buffer(s)* Additional depth and 
attnbu^m may be usetf , but the r^dering algorithm will feb described with two 2 and attribute 

buffers, and Qtiiet eases: will be described following that. The tine surrounding the memories are those in 
Figure 17 labelled B, fragment buffer and attribute buffers; The fragment buffer may be implemented as 
stack, RAM, <jueue, or FITO, because the Order! in which the fraginenta are considered is sequential. Ail 
fragjtnentjs are read in du^ng a phase> aa^ 

Srfer>ingvto Figure 17, D^erre<}Z hwdware, the XV sort portion, performs the geometric processing 
necessary to calculate toe scre^ space vertex lixiations. the HiZ block perfbnns hierardiical Z-buffering as 
explored by Ned Greene, [4]. At this point are the succeeding primitives, typify trian^es defined by their 
vertices. The triangles enter the rasterization- stage* where fragments are determined, A fragment in the 
DeferredZ architecture 13 the per sarnpie data with; ^a numerically computed perspective depth, and the 
attributes* A/ that ara use&^&^ 

Definition Fragment == Z + Attrjito^ tot ure coordin ates) . 

The processing in the blocks gfvto in Figure ?I consists of the following: 

1. Rasterization: take succeeding primitives and compute succeeding fragments. 

2L Z compare/Z saort/A A compare/ AAsort: Take succeeding fragments and process -them to determine 
the^first set of visible fragments. The number of visible fragn^ts detennined dep upon the number 
of Z and attribute buffers, A. 

3. £i^/$hade/t^ Take visible fragments, and compute the lighting, shading, and tex- 

turing, and composite with themselves and framebuffer. 

The calculation performed in the Z-compare/ AA compare, Z-sort/ A A sort can be described as occurring 
in multiple phases. The first phase is where all triangles (or primitives, whore primitives may include 
polygons, lines, points, etc) are rasterized, and sent to the compare, and sort block. The compare and sort 
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block processes them to compute the Z ordering for N layers of attributes. If N is 0, ami ouly the frame 
buffer is used the algorithm is different as mentioned earlier. 

Consider two; scenarios to simplify the description, first the scenario where there is a single sample per 
pixel, and secondly the scenario where miiitiple samples per pixel are taken. For the first scenario, the 
correct ordering bf transparent layers can to d^terniiried. Further consider the case where there are two Z 
and attribute buffers. The processir^ of prto fragments ml] :t^ place in multiple phases. Figure 

18 showsjthe pseudo-cx»de for phase I. Fligure 19 shows the pse^o-^xi^ lor phasel2i And Figure 20 shows 
the pseudo-code for phase 3 and higher. Figure 23 shows for a single r^d a hypothetic^ fra^ covering 
and depth ordering. The ey$ is shown on the left , ; and the partially transparent fragments, f x> are drawn 
as a line, while the opaque fragments aires drawn as a lirie with cro^^ Let us consider this as an 

example and describe the proci^hg; that occurs duriiig each phase of 2 fragment sorting. 

To start in phase 1/ Figure 18, all primitives are transfbrmed, ocdusion tested, then rasterized. As 
rasterized fragments are created, 'ttey are inserted into the two buffers to determine the closest opaque 
fragment and the furthest unoccluded transparent fragment . 

Rasterization: 

for all primitives: . 

raster iz« to fragments 

pa s a fragments to Z compwre wul 1 sort 

Z compare and Z sort phase 1 : 

for all fragments (as received from rasterisation) { 
fetch Zain ■ from toeaory 
if x>» Zxnixx discard fragment 
elise if fragpmnt l& opaqtte < 
write z»>2aiB, 

write fragrant info into irt tributes for shading/ lighting etc. 

} 

else /* transparent */ {. 

fetch Ziax transparent, and A-at tributes from taemory 
if Z > Zfar transparent { 

write z-> Zfar transparent and overwrite attributes 

write Zfarold and attributes to fragment buffer <X.Y,Z. A> written 

> /* z > Zfar transparent */ 
else 

exit* Z, attributes to f ragna*ot buffer 

> /* else transparent */ 
} /* for all fragmoats */ 

Figure 18: DeferredZ variant, Phase I Rasterization and fragment Sorting. 

After phase 1 all primitives ^ have >»en considered we haye a processed frame buffer Z near , A ntar , 
Zfartrafispannt, A far trajisparent v and a fragment buffer for all remaining fragments to be considered 
(X y Y 7 Zf y Af) as shown in Figurie 22. Next is phase 2, to merge the regaining fragments on the frag- 
ment buffer: For phase 2, udbad the current Z and A buffers to the lifting shading and compacting stage, 
or send Z ncarf Anear of closest opaque layer and Z far , Aj nr of furthest transparent fragment. Note that 
in this form of the invention a valid or changed bit would be needed to know which tragpients to reaoLout . 
unlike the fragment buffer described earlier. f\ & A^^^Ti A 
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Figure 23 shows a case where Oi, the closest opaque layer, and Tl, the furthest transparent layer are 
discarded, The remaining transparent layers are aU sent to the; fragment buffer. In this example, all layers 
beyond 0\ were eith« otdudfed or di&rtd&,.befo buff*?*- T^s is not necessarily 

the case, as the order of prMitives is arbitrary to st&rt^sq by closer, but 

later arriving primitives must be occluded in pass 2. In the ■ fragment buffer of the earlier sections this 
event was noted by changing state to OPAQUE JNV, winch, ooeaiis an opaque layer invalidated previous 
information. For the second ^ will process the rem^ 

using the same iwovZ buffers as before. Figure 24 sfows the multiple phases^ ^ % phase 3 r and phase 4 
to sort the reniaiiiing transparent layers. So , jp ven ; N Z and A or attribute buffers, it will take a number of 
passes \B/Pf\, where D is the woi^t <^ 4epth complexity, D - 1 transparent layers, and 1 opaque layer. 
The alg^thm for the phase 2 is given in Figure 19. The aigorithm for tfe is given in 

Figiire 20. Th^phases-^ prp^i<^ <^iier, with the difference here 

being 2 full attribute and Z buffers^ and a frame buffer following compositing. This aspect of the invention 
has not yet been implemented. 

set Zfar transparent to -infinity 

set Zadik to previous Zain or closest transparent fragment value 
for all fragments (retrieved ttpk. frajpneut gaffer) { 
fetch ZmiA iroa cMBnory 
if x>».::Zmin discard frespent 
else /* transparent, and not occluded ♦/ < 

f etca Zf ar transparent , and A-at tributes f ton taeaory 
if Z > Zf&r. transparent { : 
if Zfejr is valid { 

if Zfirst froa far is valid < 

Zf irstf roffl far, A go to fragment buffer 

} 

aore. Zf ar , Afar to 2f irstf roia fair, A f irstf roof ar 

J tf as > Zf ar transparent */ 
else 

write Z, attributes to fragment buffer 
> /* else transparent */ 
} h for all' fragments *'/ 

> 



Figure 19: DeferredZ variant, Phase 2, discarding fnrther layers* and determining next N I transparent 
layers. 

This demonstrates the logic of replacement during the fragjnent buffer processing. The key advantage 
of the fragment buffer for transparency is it computes Exactly the correct ordering with fixed Z buffer, 
attribute buffer, and a variable fragment buffer, By only reading out fragments that have been updated ' t 
and invalidating those while writing them out you can get the valid bits for Zfe and Zfi r$m mj*r reset very 
economically -the technique does not preclude image space subdivbioia for example, two areas being worked 
oil, one with lighting and compositing occurring, and one with Z compare Z sort occurring to effectively 
load balancethe work between units. Or, for example, passing along any succeeding fragments with texture 
values to be processed , and re-Z~buffered > dej>endirig on the mix of work. Next to be described is antialiasing 
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tor all 2 ragman to (retrieved from fxagnajit buffer) { 
if z > Zfar i 

if Zfar is valid < 

if Hiztx from fax is valid { 

Zflrstfron fax, A go to fragment buffer 

> 

jsova Zfar, Afar to Zfiratfrom. far, A firatfromfax 

} ; 

overvxita -Zfar vith Z 

> 

eisa itl x > xt irst; f roitfax { 
if "ant irat from far ta valid < 

if irstfrottfar, AfixBtfrwof ax to fragment buffar 

> ;. 

ovanrrite Zf irstfroaf ar vith Z 

> 

} /♦ for all fragments ♦/ 



Figure 20: DeferredZ variant, Phase 3 and higher, determining next iVttanspar^t layers, 
with the same fragment buffer. 

5.2 Ftagpcnt Stack for A-buffer/ Carpeirter Antialiasing 

An instructional example to demonstrate the flexibility of the invention is to show how the A-buffer an 
antialiasing schemfe by toren Carpenter may he implemented. Some key things io note about the A- 
buffer, is that it has been oft^ emdated > is a software only teehriiquej and performs correct transparency as 
well as antialiasing. Becau*^ it is; a soiWare only tec^qiie, showing how it may be; prac^ 
in hardware is a significant advancement beyond the state-ofctherart: Other hardware ^ have 
emulated^some par t of Carpenter's A-buffer, such as Stephanie Winner et al. ft2]v/:but with %ade-pfe: such 
as handling only : a fixed number of fragments per pixel , say 4. Others such as- Mplnar et al. [8, 10] have 
claimed to be able to perform A-buffer processing; but such claims are not supported. PixelFlow cannot 
sort transpar^t layers because their compositing network, is only a 55^mparis^/Zrbuffeing network. To 
do : proper; resprtang and compositing is an unsolved problem for a PixelFlow, or sort last: architecture. 

Arbuffer defines a fragment as a polygon clipped to a pixel boimdary. They have two cases, either a pixel 
is fully covered by ^ polygon, or api&#is 

The data struct ure uses a linked list of fragments, sorted from front- t6-ba<& by their minimum Z% shown in 
Figure 2ol A pixel; struct stores both cases of a pixel. Figure 25 shows the pixel stitict; 

The mask is a 4x8 bit mask, representing subsample locations that the polygon <^y#s; Two Z values are 
saved for ieach fra^eht; a minimum and maximum 2 value, to aid in bidding fragments that overlap. The 
result of processing is an array of pixel structs, the size ofithe frame^bufTer, and linked lists of fragments of 
variable length for each pixel . 

Now, to implement the same data structures with tlie fragment buffer requires sending the fragments 
for each pixel, on to the fragment buffer, if they are not; the, closest = pixels, where we have jV, Z and 
attribute buffers. For earlier examples N is 1 or 2, and so I use, AT = 2 for this example ateo. Alt polygons 
are rendered and converted to fragments, and the 2 closest to the eye fragments are stored in the random 
access attribute buffers, while all other fragments are sent to the fragment buffer. fi . /%<t~~ic 

Aft 
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Figure 21:: Memories for true tnuisparericy and improvise antialiasing in the, B, fragment buffer and attribute 
buffers; portion. 
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Figure; 22: Buffers and rragment buffer after first phase of rasterization and sorting. 
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Figure 23: Example of succeeding opaque and transparent layers. 
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Figure 24: Remaining phases 
23 

pixelstruct /fragment 

f ragBent.pt r wat 

short, int r;g,b 
** opacity 
'* area 
M objstt tig 

pixoLrtask a 

float zmax t zmiii 



(phase 2, 3, and 4) for determining sort order for trahsparerit layers of Figure 



Figure 25: A short int Is 12 bits, and both an area and opacity are used to more accurately determine pixel 
coverage 
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Figure 26: A-bufFer data structures from Carpenter's software teehn 
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Figure 2i: fragment buffer for antialiasing, N - 2, 2 and attribute buffers, the fragment buffer, and the 
hardware modules at this point in the pipeline. 

Figure 27 shows a schematic of the fragment buffer, and the two Z and attribute buffers. In the case 
for the transparency only, we sorted from back^tc^froht, now, according to the A-buffer method, we sort 
front-to-back, by fragments minimum Z Y so, in essence, we have the first two fragments for each pixel in a 
dedic^t^ ra^ddm access arraiyyand Ml others cire thrown onto the fragment buffer: 

A-buffer proposing x^ui^ Instead, a multipass consideration 

of fragments is Also, nidtipie biiffer are us«d when traversing. First, 

how the frorit-t^back ordjmng is determj ned ; the Z\ and Z% buffers store the near Z% wbile^ the attribute 
buffer sto'res the far or delta 2, Z min = Z min and Z mox =• Z^ in + £ m o*«ktto . This allows fewer bits to be 
used for -.-ZrajBa,.. All fragments are considei^ by msertion aM possible r^lacemi^t of 2i and Zi as described 
earlier; Recall the earlier example, here shown sorted after phase 1, in Figure 2& 



4- 




CotxrjpotAed 



Figure 28: A-buffer example with: same fragments as sorted in Figure 23. 

Figure 28 shows that now. we keep transparent fra^^rtts; T$ and £!| in the Z\ and Zz> and all other 
fragnienU are sent to thefra^ent buffer. The sortingis done by ne^^t Z to implement A-buffer. Fragments 
T3, T2 r tp 0\ are sent to the fragment buffer, as well as fragments: beyond 0 1V which would have been 
discarded in back-to-frpnt ordering, T x > O x ,T Xy T x > We continue to sort as shown below In Figure 29. 

FYagments -% : and 22 are captured as the next two closest fragments. All other fragments are again sent 
to the fragment biirTer, In the ni^t pass through the fragments, the transparent layer 7i and the opaque layer 
0\ are found, and all other fra^meots are discarded. The lighting arioVs^admg/ blends 
these fragments as they fully coyer the pixel, so they are simply processed here. To 

show how processing proceeds when fragments partially cover a pixel; I provide example of coverage. The 
fragments axe still all sorted frontrto^back, but now, they are not necessarily composited in that order. A 
recursion is created by recirculating the fragments again . 

A fragment's coverage of a pixel detennines the inside, subsamples covered by the polygon, and outside, . 
subsainpfes not covered by a polygon. Figure 30 shows the coverage of four polygons A, B, G* and D over 
a square pixel area. Note that improved fragment representations, such as those that use D^ x D y slopes 
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Figure 29: A- buffer example continued combining from Figure 28, continued combining the next two and 
the next two. 



can also be used. I am using Qai|^ter aJgprithm for clarity, and would recommend implementing a more 
5»phi£rticfted algprithm: The polygons would be processed into fragments fox this pixel; with A, B t C. D, 
frontoOKbW^ The A and B polygon fragments, wouW the Z x , and Z 2 

buffers. The raasksjh^ would be part of the a^riKite buffer « When the Z u A\ (attribute), 

and Z 2 , 42 buffers are unloaded, the search mask is computed. The search mask is the binary mask for the 
outside of the current accumulated fragments, M ie arch : For this example the search mask is first set to the 
area outside of the fragment A<nu, M mV cH - A onty then this search mask is used to: compute in and out, 
from front and back of polygon fragment B . The equations directly fr^m Can>ehter/page 105. 

M in - Miearch O M f 

Mout == ¥itATCh O ^Mf 

And, the interesting thing that we can do, is to change a recursive c^culaticm to a purely iterative one by 
knowing th^tt ^ompo^ting ca« be made folly associative if computed with transp^^eies. The accumulated 
transparency is calculated in the shade/composi te circuitry, and stored in the frame buffer. Essentially take 
the fplibmng ^eroWj^^&cm jd^pSnter 

C = G in x Ain,+ Cwt x (1 ~ as recursed for our example C = <?mX x A^a + CW; x (1 - A inA ) 

Co U tJ ■= CinB X AinB + Coy*2 X (1 - A fas) 
Cent* = CinC X A in C + pwtf* X (1 - A^c) 

Here & is color, Gin is the color from the inside covered area of the : fragment, A in is the area coverage of 
the fragnient. Th final cpbr for the pixel is C ou ty Here Carpenter's approach is: converted to purely a sum 
in terms qf trauiisp^rency: 

C = OmA x AifiA + t A X C&b * AinB f *AB x CinC X A mC + t'ABC X C inD X AinO 

Four separate contributions from the four separate layers, where transparency of polygon fragment A 
is %a = (1 - AinA^ The equations show how the recursion may be converted to direct evaluation. Figure 
30 shows how the masks for coverage of each fragment are initially calculated. The Am m the compositing 
equations are those masks, cpjistrained by the search mask. The search mask is updated as the sorting 
processes fragments from front-tp-back. As before fragments A, B, and D are^orted front^o-hack, and 
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Figure 30: Four polygon *s coverage masks for a single pixel. 

shown in Figure 31. The search mask is updated frontrto-back, aiid used at each step to compute the 
or as an approximation to the are^ The second column Af = M 8t arch n Mj is used for A^ at those 
steps. And, each is used to update l^anspareiicy. Ain Jqt A - M in for A, 10/16. 

Then t^ = 1 - 10/16. ^ Jbr J ^i il^ = 3/16,t B =± i - 3/16/ and ^ = (1- 10/16) x (1 - 3/16) 
tABC = (4/16) x (13/16} x (2#1# A 

Figure 31 Bhpws clearly how the eijua^n us&g transparencies is : built up in time. I show below the 
pipeline processir^ ^ result. The curtent fra^nent and the search mask are use<i 

to compute mask in and the next Search mask. 

The search mask is eomputM by = Afoc/r - Msn^Mj {currcnt j and the mask for inside the fra^ 
is computed by Min^rran^ro^^ ^ n The mask calculation ^ 

the current ir^^ or A 2 . The masks are used to 

compute -the opacity or roughly th? area by summing the bits with the rain mask 

Whem the ftagment pixel local^^hj^ been processed, the search mask for that location must be sorted to 
memory mto the attribute buffer, attd reloaded. The same is true for the transparency, and the accumulated 
color. To implement A-buffer, the search; mask would have to be added to random access memory storage, 
as in Z i9 Ay, Z% } A.% x and Ms for the sort and compare memories. This would allow for rra^nents from 
different pixel locations ta be <»n^ered in any order, as the mask state will always be available. 

6 Conclusions 

The economical implementafcipu of true transparency has been shown through the invention of the fragment 
buffer. A simulation was developed for the best fit for a traditional buffering architecture. The detailed 
hardware architecture and: control logic were presented. Simulation results were shown, as partial validation 
of the design with several models, the fragment buffer can provide true transparency with additional off 
chip DRAM storage, used in combination with new comparison, control, and compositing logic. Z buffering 
architectures already include most of the comparison logic arid compositing. The hovel aspects are a state 
machine per pixel, arid multiple phases of processing recirculating fragments. 
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Figure 31: M/, Af tal and M tean k are computed as the fragments are sorted from front to back, shown by 
the timeline going from top to bottom. 
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The architecture has the advantage over previous work of 1 ) not requiring high per pixel dedicated storage, 
2) supporting any depth complexity as long as the average depth complexity is bounded, 3) not requiring 
on <±ip storage^ 4) not requiring multiple geometry passes, 5) not requinng software sorting of primitives, 
and 6) nqt being an approximation, Uke $cre^h door transparency. Tht benefits axe substantial, but much 
further work is needed: Examples wre provided for how such a scheme could provide autiabaaing, and also 
work in a tile based^deferred shading architecture. There is additional pff chip memory required, frorii 2,3 to 
3.6 times morememory for scenes with 12 layers per pixel and ^ cbm^Ieidt^es of r 2.l7 to 3.93 (for 

covered pixels). Because die fragment access is linear, a graceful degradation by paging fragments to system 
memory would prov^^ The hardware logic is ; straightforward and can be incorporated into 

the current Z compare and composite of a traditional architecture. A key invention is the modification of 
a recursive fw>ftu^e teriinique to an iterative front-to-back hardware trcbnlque. There are tradeoffs in the 
amount of memory used that the ideal architecture will dejjend oh 

the price point audvavailable semiconductor technology. 
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